Exploration of Drivers and Model Forms

Author
Affiliation

Gulf of Maine Research Institute

Published

March 21, 2024

Sea Surface Temperature and Size Change Expectations

Code
library(tidyverse)
library(gmRi)
library(tidyverse)
library(ggeffects)
library(zoo)
library(lme4)
library(mgcv)
library(patchwork)
library(emmeans)
library(rstatix)
library(ggpubr)

theme_set(theme_gmri())

#--------------------
## Set up
#--------------------

# Load the data 
df <- read_csv(here::here("data/size_and_spectra_model_data.csv"))

# Get raw landings
landings_raw <- read_csv(here::here("data/unscaled_spectra_predictor_dataset_wide.csv")) %>% 
  select(year, survey_area, landings_raw = landings)

# Get the raw sst from oisst
sst_raw = read_csv(here::here("data/unscaled_regional_sst.csv"))

# Put it all together and clarify the column names
df <- df %>% 
  select(-sst) %>% 
  left_join(landings_raw) %>% 
  left_join(sst_raw) %>% 
  mutate(
    survey_area = factor(
      survey_area, 
      levels = c("Gulf of Maine", "Georges Bank", 
                 "Southern New England", "Mid-Atlantic Bight")),
    region = str_replace_all(survey_area, "-| ", "_"),
    region = factor(
      region, 
      levels = c("Gulf_of_Maine", "Georges_Bank", 
                 "Mid_Atlantic_Bight", "Southern_New_England")))

Raw SST should (and does) increase with lower latitudes.

The temperature size rule does not predict that the mean/median size of a community should be correlated to the ambient temperature.

The temperature-size rule (TSR) is a widespread phenomenon, which describes the phenotypic plastic response of species’ size to temperature: individuals reared at colder temperatures mature as larger adults than at warmer temperatures.

Though this may be the case, this is not what it contends with directly, or that is not my understanding of it. I am operating under an assumption that each region is composed of individuals that on-average are living under temperatures that would not be considered “colder” or “warmer” than their preference.

Code
p1 <- ggplot(df) +
  geom_point(aes(sst, med_len, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, med_len)) +
  labs(x = "SST", y = "Median Length", title = "Median length is Negatively\nCorrelated With SST")

p2 <- ggplot(df) +
  geom_point(aes(sst, med_wt, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, med_wt)) +
  labs(x = "SST", y = "Median Weight", title = "Median Weight is as Well")

p3 <- ggplot(df) +
  geom_point(aes(sst, b, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, b)) +
  labs(x = "SST", y = "Size Spectra Slope", title = "Size Spectra Slope Also Appears to Be...\nBut it is unclear why it would.") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect")

Anomalies are based off a 30-year climatology for each respective region, and should capture departures from the long-term average. This also helps out with a problem of modeling SST x Region interactions, where the distributions of sst for the regions do not overlap well. In other words, the mid-Atlantic Bight is always warmer than southern new england, which is usually warmer than Georges or GOM.

The TSR contends that growth rates for species reared under higher temperatures show a plastic response where growth rates at early ages are elevated, and maturation happens earlier and at smaller sizes.

This effect IMO is better seen through anomalies, and also would not expect not operate differently between these regions.

Code
p1 <- ggplot(df) +
  geom_boxplot(aes(x = survey_area, y = sst, color = survey_area)) +
  labs(x = "Area", y = "OISST", title = "There are Different Baseline SSTs\nAmong Regions")

p2 <- ggplot(df, aes(x = survey_area, y = sst_anom, color = survey_area)) + 
  geom_boxplot() +
  labs(title = "Anomalies (or scaled values)\nFrame in terms of regional average", x = "Area", y = "OISST Anomaly")


p3 <- ggplot(drop_na(df, sst), aes(year, sst_anom, color = survey_area)) +
  geom_point(alpha = 0.4) +
  geom_smooth(se = F, linewidth = 1) +
  labs(title = "All regions have elevated\nSST conditions in last decade", x = "Year", y = "OISST Anomalies")


p4 <- ggplot(drop_na(df, ersst), aes(year, ersst_anom, color = survey_area)) +
  geom_point(alpha = 0.4) +
  geom_smooth(se = F, linewidth = 1) +
  labs(title = "Departure from long-term trend\nstarts in 2000's", x = "Year", y = "ERSST Anomalies")


p1 + p3 + p2 + p4 + plot_layout(ncol = 2, guides = "collect")

How do anomalies look with respect to these body-size metrics?

In a bit of a surprise to me, there does not seem to be any clear signal that any of these body-size related indicators are correlated with SSt anomalies.

Code
p1 <- ggplot(df, aes(sst_anom, med_len)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Median Length")

p2 <-  ggplot(df, aes(sst_anom, med_wt)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Median Weight")

p3 <- ggplot(df, aes(sst_anom, b)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Size Spectra Slope") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect") + plot_annotation(title = "SST anomalies show no global correlation, could be an interaction situation, could be nothing...")

Its possible that maybe the regions would have some differential response. However, I wouldn’t anticipate them to behave differently under expectations from the TSR.

It also does not appear that there is much evidence for that either…

Code
p1 <- ggplot(df, aes(sst_anom, med_len, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Median Length")

p2 <-  ggplot(df, aes(sst_anom, med_wt, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Median Weight")

p3 <- ggplot(df, aes(sst_anom, b, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Size Spectra Slope") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect") + plot_annotation(title = "SST Anomaly x Region Interaction")

I would almost expect SST anomalies to have a non-linear response shape of some dome-like shape mirroring a thermal preference curve.

This visual check isn’t very convincing of that though.

Code
ggplot(df, aes(sst_anom, b, color = survey_area)) +
  geom_point(aes()) +
  geom_smooth(methog = "gam") +
  facet_wrap(~survey_area, ncol = 2)

Should we Drop SST/SST Anomalies?

From Bart:

Therefore, I think the best approach is to drop SST and focus on year as the predictor (and a proxy for SST).

My Thoughts:
Why would we want a proxy for SST when we have SST? Wouldn’t it be more prudent to make a statement on a lack of SST’s association with median size and spectra slope?

Including region and year brings in additional complications of collinearity. Each of the other variables shows trends through time. While it would be possible to follow a differencing procedure (e.g. remove the temporal trend) in these variables, I think a simpler and must easier approach is to just include year as both a predictor and a random effect, and remove all other covariates. What we are doing here is assuming that there is lots of stuff that differs with year, but we are going to push all that variation into the random effect component of the model in order to focus on the temporal trends by region.

My Thoughts:
I can see some of the rationale of going this route for the convenience of navigating the multicollineary of time+sst+landings.

But this option is probably the furthest from testing a hypothesis for either landings or SST.

I believe we’d have a decent model in terms of unbiased predictions and decent performance, but it seems like kind of a hands in the air for our original hypotheses and just focusing on whether there were changes over time.

I’m also confused whether you can/should use year both as a fixed and a random. And also whether you are using it as a continuous variable for fixed and intended to use it as a categorical variable in the random effects term. I don’t think random effects can be continuous variables.

Year as a Random Effect

I’m confused how year is being treated when it is in the model as a fixed term AND a random intercept term:

The model formula you had in your summary resembles what I would anticipate for having “year” as a continuous fixed effect.

Code
# Model summaries look the same if you enforce year as a factor for random effect or not
# ggpredict plots do not.
mod_regionXyear <- lmer(b ~ year*survey_area + (1|year), data = df)

cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", title = "Mixed-Effects Model:\nRegion Fixed Effects")+
  theme_classic()

p2 <- plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) + 
  ggtitle("Model Formula:\nyear * region + (1 | year)")

p1 | p2

This is what I imagine a random effect for year would produce.

Code
# Enforcing year as a factor
mod_regionXyear <- lmer(b ~ year*survey_area + (1|factor(year)), data = df)

cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", 
       title = "Mixed-Effects Model:\nRegion Fixed Effects")+
  theme_classic()

p2 <- plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) + 
  ggtitle("Model Formula:\nyear * region + (1 | factor(year))")

p1 | p2

Or use much simpler model?

If we care about the change over time differences I think we’d want year as a fixed effect. And indeed the coefficent estimates are the same.

Code
simple_mod <- lm(b ~ year*survey_area, data = df)
simple_cont <- emmeans::emtrends(simple_mod, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(simple_cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", title = "Fixed-Effects Only Model:\n Region Intercepts")+
  theme_classic()

p2 <- plot(ggpredict(simple_mod, ~year+survey_area), add.data = T) + ggtitle("Model Formula:\nyear * region")

p1 | p2

Proposing Alternative Random Effects Structure

I think there is some sense to have year as a R.E. It can help soak up some variance for things that happened to the four regions on any year that we don’t have measurements for (stratification dynamics, weather patterns, GSI, etc)

But I don’t know whether it should also be a fixed effect in the same model.

I don’t really care about whether regions have changed over time, our hypotheses are both focused on whether change can be associated with warming or fishing removals. So what if we swapped in sst?

Code
# New model:
mod_regionXsst_anom <- lmer(
  b ~ sst_anom * survey_area + (1|year), 
  data = df)

# Get contrasts
cont <- emmeans::emtrends(mod_regionXsst_anom, pairwise ~ survey_area, var = "sst_anom")


p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = sst_anom.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Mixed-Effects Model Formula::\nb ~ sst_anom * region + (1 | year)\nRegion Intercepts")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(mod_regionXsst_anom, ~year+survey_area), add.data = T) + ggtitle("Year x Slope No Relationship")

p3 <- plot(ggpredict(mod_regionXsst_anom, ~sst_anom+survey_area), add.data = T) + ggtitle("sst_anom x Region Interaction Seems Like Something Now")


p1 / p2 / p3


What about landings?

Landings are the other covariate that we have a defined hypothesis for.

Landings are also collinear with year.

Landings also are highest for Georges Bank and Gulf of Maine before SST is available. So there is a data availability challenge for trying to work with the two covariates in addition to multicollinearity….

As far as a picking an informed model structure: - Similar value overlap issue that raw SST had, where the landings values span different ranges
- I would want to have a regional interaction, because the species targeted and the fishing means will be different, likely leading to different impacts

So if I were just doing landings I would probably start here:

Code
p1 <- ggplot(df,  aes(year, landings,  color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "Relative Regional Landings by Area")


p2 <- ggplot(df,  aes(landings, med_len, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "median length ~ s(landings_scaled)")

p3 <- ggplot(df,  aes(landings, b, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "b ~ s(landings_scaled)")

p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect")

This suggests to me that:

harvest is not a strong selective force for smaller body sizes from 1970-2019 in the Northwestern Atlantic. With some potential interaction of importance in the Gulf of Maine.

Which could support the notion that:

Reductions in harvest are indeed weakening harvest induced declines in body size.

However, the anticipated potential increases in body size due to lower harvest are now masked by increases in temperature.

The data offers evidence that declines in body size in the fish community in the Northwest Atlantic are due to relative increases in regional sst since 1982.


Testing What we Care about:

Here is how I think I would proceed:

What do We Know:

Changes in median body length and size spectrum slope have occurred within different regions of the Northeast shelf.

These regions have experienced increases in average temperature.

Ecological literature would suggest that elevated temperatures could impact the size structure of the community via mechanisms related to TSR theory.

The Northeast US is also home to a number of large-scale commercial Fisheries. Fisheries also have been shown to impact the size structure of communities through removal of larger individuals and selective pressures.

For 3/4 of our regions these two covariates are colinear. Making it difficult to use either in a model together. They are also both colinear with year further complicating things.

Code
library(ggpmisc)

p1 <- ggplot(df, aes(sst_anom, landings)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p2 <- ggplot(df, aes(year, sst_anom)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p3 <- ggplot(df, aes(year, landings)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p1 + p2 + p3 + plot_layout(ncol = 2) + plot_annotation(title = "Three-way colinearity")

Should we even use RE?

I think there is a good case to make for regions to be treated as random effects. With any of the following reasons contributing to samples within the regions not being statistically independent, among other possibilities:
- differential species composition
- different mixing/productivity regimes
- different fisheries

I just know less what this means in terms of any our hypotheses.

Code
length_noyr_mm <- lmer(med_len ~ (1 | survey_area) * landings + sst_anom, data = df )

Moving Forward: Hypothesis Based Model Structures

From a hypothesis standpoint I would anticipate the following: 1. SSt anomalies should have a global non-linear impact effect that is stable across regions (no interaction with region)
- Landings could very likely differ between region due to different fisheries, and different community compositions (interaction with landing)
- Regions have different community structure, bathymetry, current dynamics, and connectedness to different sized nearshore habitats. I would expect each to vary (intercept)

Modeling without Year

From the above hypotheses, this is the linear model structure I would use. Year is not included as a term in these models.

Median Length Model:

Code
# Linear model with our primary hypotheses
length_noyr <- lm(med_len ~ region * landings + sst_anom, data = df )

summary(length_noyr)

Call:
lm(formula = med_len ~ region * landings + sst_anom, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.8119  -2.4864  -0.3935   2.4744  12.3483 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                          26.46099    0.72904  36.296   <2e-16 ***
regionGeorges_Bank                   -0.09738    1.02173  -0.095   0.9242    
regionMid_Atlantic_Bight            -10.48518    1.01415 -10.339   <2e-16 ***
regionSouthern_New_England           -9.92448    1.00205  -9.904   <2e-16 ***
landings                              1.56107    0.80875   1.930   0.0556 .  
sst_anom                             -0.11020    0.61238  -0.180   0.8574    
regionGeorges_Bank:landings          -0.96418    1.03600  -0.931   0.3536    
regionMid_Atlantic_Bight:landings    -2.16327    1.04527  -2.070   0.0403 *  
regionSouthern_New_England:landings  -1.71460    0.99528  -1.723   0.0871 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.118 on 142 degrees of freedom
  (49 observations deleted due to missingness)
Multiple R-squared:  0.6101,    Adjusted R-squared:  0.5881 
F-statistic: 27.77 on 8 and 142 DF,  p-value: < 2.2e-16
Code
# Get contrasts
cont <- emmeans::emtrends(length_noyr, pairwise ~ region, var = "landings")

# Plot contrasts
p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(region), x = landings.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Fixed-Effects lm Formula::\nmed_length ~ sst_anom + landings * region")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(length_noyr, ~sst_anom), add.data = T) + labs(y = "Median Length", x = "SST Anomaly")

p3 <- plot(ggpredict(length_noyr, ~landings*region), add.data = T)  + labs(y = "Median Length", x = "Standardized Regional landings")


p1 / p2 / p3

Code
# Augment with Model fit and residuals:
df_aug <- augment(length_noyr, drop_na(df, sst_anom, landings)) %>% 
  mutate(resid_col = ifelse(.resid > 0, "darkred", "royalblue"))


# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, med_len, color = "Observed")) +
  geom_line(aes(year, .fitted, color = "Model Fit")) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Model fits")

Code
# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, .resid), color = "gray") +
  geom_point(aes(year, .resid, color = I(resid_col))) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Temporal Patterns in Residuals")

Spectra Slope Model:

Code
# Linear model with our primary hypotheses
b_noyr <- lm(b ~ region * landings + sst_anom, data = df )

summary(b_noyr)

Call:
lm(formula = b ~ region * landings + sst_anom, data = df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.160563 -0.036144 -0.007886  0.033354  0.139481 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         -0.922175   0.009809 -94.013  < 2e-16 ***
regionGeorges_Bank                  -0.012645   0.013747  -0.920   0.3592    
regionMid_Atlantic_Bight            -0.112674   0.013645  -8.258 9.35e-14 ***
regionSouthern_New_England          -0.117426   0.013482  -8.710 7.09e-15 ***
landings                             0.017087   0.010881   1.570   0.1186    
sst_anom                            -0.003155   0.008239  -0.383   0.7024    
regionGeorges_Bank:landings         -0.008689   0.013939  -0.623   0.5340    
regionMid_Atlantic_Bight:landings   -0.034807   0.014064  -2.475   0.0145 *  
regionSouthern_New_England:landings -0.007334   0.013391  -0.548   0.5848    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05541 on 142 degrees of freedom
  (49 observations deleted due to missingness)
Multiple R-squared:  0.5207,    Adjusted R-squared:  0.4937 
F-statistic: 19.28 on 8 and 142 DF,  p-value: < 2.2e-16
Code
# Get contrasts
cont <- emmeans::emtrends(b_noyr, pairwise ~ region, var = "landings")

# Plot contrasts
p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(region), x = landings.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Fixed-Effects lm Formula::\nb ~ sst_anom + landings * region")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(b_noyr, ~sst_anom), add.data = T) + labs(y = "Median Length", x = "SST Anomaly")

p3 <- plot(ggpredict(b_noyr, ~landings*region), add.data = T)  + labs(y = "Median Length", x = "Standardized Regional landings")


p1 / p2 / p3

Code
# Augment with Model fit and residuals:
df_aug <- augment(b_noyr, drop_na(df, sst_anom, landings)) %>% 
  mutate(resid_col = ifelse(.resid > 0, "darkred", "royalblue"))


# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, b, color = "Observed")) +
  geom_line(aes(year, .fitted, color = "Model Fit")) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Model fits")

Code
# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, .resid), color = "gray") +
  geom_point(aes(year, .resid, color = I(resid_col))) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Temporal Patterns in Residuals")


Supplemental Test 2: Have Regions Seen Changes in Slope/Size

This would be preliminary/supplemental exploration of whether median size or size spectra slopes have changed within regions.

These would be straightforward linear models with a region interaction. They provide validity to statements of change over time and differences between regions, but do not speak to hypotheses on why those changes have occurred.

Code
# ANCOVA is just a slope and intercept interaction model
length_change_lm <- lm(med_len ~ year * region, data = df)

# Test homegeneity of slopes
df %>% anova_test(med_len ~ region * year)
Effect DFn DFd F p p<.05 ges
region 3 192 103.611 0.000 * 0.618
year 1 192 4.182 0.042 * 0.021
region:year 3 192 5.288 0.002 * 0.076
Code
# They aren't homogenous - pairwise
slopes_lst <- emtrends(length_change_lm, ~ region, var = "year")
slopes_lst          # slope estimates and CIs
 region               year.trend     SE  df lower.CL upper.CL
 Gulf_of_Maine           -0.1449 0.0408 192 -0.22549 -0.06436
 Georges_Bank            -0.0718 0.0408 192 -0.15240  0.00873
 Mid_Atlantic_Bight       0.0798 0.0408 192 -0.00072  0.16041
 Southern_New_England    -0.0302 0.0408 192 -0.11072  0.05041

Confidence level used: 0.95 
Code
pairs(slopes_lst)   # comparisons
 contrast                                  estimate     SE  df t.ratio p.value
 Gulf_of_Maine - Georges_Bank               -0.0731 0.0578 192  -1.265  0.5862
 Gulf_of_Maine - Mid_Atlantic_Bight         -0.2248 0.0578 192  -3.891  0.0008
 Gulf_of_Maine - Southern_New_England       -0.1148 0.0578 192  -1.987  0.1966
 Georges_Bank - Mid_Atlantic_Bight          -0.1517 0.0578 192  -2.626  0.0458
 Georges_Bank - Southern_New_England        -0.0417 0.0578 192  -0.722  0.8884
 Mid_Atlantic_Bight - Southern_New_England   0.1100 0.0578 192   1.904  0.2297

P value adjustment: tukey method for comparing a family of 4 estimates 
Code
# df %>% 
#   emmeans_test(
#     formula = med_len ~ region, 
#     covariate = year,
#     ref.group = "all",
#     p.adjust.method = "bonferroni") %>% 
#   add_xy_position(
#     x = "region", fun = "mean_se")


#---- below might depend on slope assumption  ------

# ANOVA test for between significance
res.aov <- df %>% anova_test(med_len ~ year + region)
get_anova_table(res.aov)
Effect DFn DFd F p p<.05 ges
year 1 195 3.924 0.049 * 0.020
region 3 195 97.198 0.000 * 0.599
Code
# Pairwise comparisons


# Get pairwise results for the factors
pwc <- df %>% 
  emmeans_test(
    med_len ~ region, 
    covariate = year,
    p.adjust.method = "bonferroni") %>% 
  add_xy_position(
    x = "region", fun = "mean_se")


# Display the adjusted means of each group
# Also called as the estimated marginal means (emmeans)
get_emmeans(pwc) %>% 
  ggplot(aes(emmean, fct_rev(region))) + 
  geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
  theme_gmri() +
  labs(
    x = "Median Length",
    y = "Region",
    title = "GB + GOM significantly Longer Lengths than MAB & SNE\n
                Not significantly different within the pairs")

Code
# Visualization: line plots with p-values
get_emmeans(pwc) %>% 
  left_join(select(df, survey_area, region)) %>% 
  ggline(x = "survey_area", y = "emmean") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) + 
  stat_pvalue_manual(pwc, hide.ns = TRUE, tip.length = FALSE) +
  labs(
    subtitle = get_test_label(res.aov, detailed = TRUE),
    caption = get_pwc_label(pwc)) +
  labs(y = "Median Length (cm)", x = "Region")

Code
length_change_lm <- lm(med_len ~ year * survey_area, data = df)
b_change_lm <- lm(b ~ year * survey_area, data = df)

# Inspect the model diagnostic metrics
library(broom)
length_metrics <- augment(length_change_lm) 
b_metrics <- augment(b_change_lm)